Appendix D — Assignment D

Matrix approach & MLR

Instructions

You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.
Make R code chunks to insert code and type your answer outside the code chunks. Ensure that the solution is written neatly enough to understand and grade.
Render the file as HTML to submit. For theoretical questions, you can either type the answer and include the solutions in this file, or write the solution on paper, scan and submit separately.
The assignment is worth 100 points, and is due on 5th November 2023 at 11:59 pm.
Five points are properly formatting the assignment. The breakdown is as follows:

Must be an HTML file rendered using Quarto (the theory part may be scanned and submitted separately) (2 pts).
There aren’t excessively long outputs of extraneous information (e.g. no printouts of entire data frames without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.). There is no piece of unnecessary / redundant code, and no unnecessary / redundant text (1 pt)
Final answers of each question are written clearly (1 pt).
The proofs are legible, and clearly written with reasoning provided for every step. They are easy to follow and understand (1 pt)

D.1 OLS estimator bias

Show that the least squares estimator \(\hat{\beta} = (X^TX)^{-1}X^TY\) is unbiased.

(3 points)

D.2 Variance-covariace of fitted values

Obtain the expression for the variance-covariance matrix of the fitted values \(\hat{Y}_i, i = 1,...,n\) in terms of the hat matrix \(H\).

(3 points)

D.3 Maximum likelihood estimation for Generalized Least Squares

The density of the multivariate normal distribution is:

\(f(Y) = \frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}}\exp\bigg[-\frac{1}{2}(Y-\mu)^T\Sigma^{-1}(Y-\mu)\bigg]\)

For the linear regression model:

\(Y = X\beta+\epsilon, \epsilon \sim N(0, \Sigma)\),

derive the maximum likelihood estimate of the regression coefficient vector \(\beta\).

What is the Hat matrix?

Note: The variance-covariance matrix of the error term is \(\Sigma\), which implies that the error terms may or may not be correlated, and the variance of the error terms may not be constant. You are deriving the estimates for this general scenario.

(6 + 2 = 8 points)

D.4 General Linear Regression model

For each of the following regression models, indicate if it can be transformed to a general linear regression model. Assume \(\epsilon_i \sim N(0, \sigma^2)\). Justify your answer, i.e., mention the appropriate transformation(s). Note that the assumption \(\epsilon_i \sim N(0, \sigma^2)\) need to hold for a general linear regression model.

\(Y_i = \beta_0 + \beta_1X_{i1}+\beta_2\log(X_{i2})+\beta_3X_{i1}^2+\epsilon_i\)
\(Y_i = \epsilon_i\exp(\beta_0+\beta_1X_{i1}+\beta_2X_{i2}^2)\)
\(Y_i = \log(\beta_1X_{i1})+\beta_2X_{i2}+\epsilon_i\)
\(Y_i = \beta_0\exp(\beta_1X_{i1})+\epsilon_i\)
\(Y_i = [1 + \exp(\beta_0+\beta_1X_{i1}+\epsilon_i)]^{-1}\)

(5x2 = 10 points)

D.5 \(Cor(Y_i, \hat{Y}_i) = \sqrt(R^2)\)

For the multiple linear regression model, show that the square of the correlation between the response \(Y_i\) and the fitted values \(\hat{Y}_i\) is the coefficient of determination \(R^2\)

Hint:

For random variables \(a\) and \(b\):

\(Cov(a, a) = Var(a)\),

\([Cor(a, b)]^2 = \frac{[Cov(a,b)]^2}{Var(a)Var(b)}\).

For the linear regression model:

\(Y = \hat{Y}+\epsilon\)

(8 points)

D.6 Developing the MLR model

Read the dataset house_prices.csv.

D.6.1

Make a pairplot. Which predictors seem to be useful to predict house_price? Ignore house_id while making the pairplot.

(2+2 = 4 points)

D.6.2

Print the pairwise correlation matrix. Which predictors seem to be useful to predict house_price? Ignore house_id while printing the correlation matrix.

(2+2 = 4 points)

D.6.3

Develop a linear regression model to predict house_price based on house_age, distance_MRT, number_convenience_stores, and latitude. Report the model \(R^2\) and \(R^2_{adj}\)

(2+2 = 4 points)

D.6.4

Make the diagnostic plots to verify if the model satisfies the assumptions of:

Linear relationship
Homoscedasticity
Normal distribution of errors

Also verify the assumptions of homoscedasticity and normal distribution of errors with statistical tests.

(2x3 + 2 = 8 points)

D.6.5

Plot the residuals against each predictor. For each predictor, comment if it seems to have a linear relationship with the response, and if the error variance seems constant.

(4x2 = 8 points)

D.6.6

Given the analysis in the previous plots (in D.6.4 and D.6.5), will it be appropriate to transform the predictors or the response or both? Why? If both, then which should you transform first?

(1+2+1 = 4 points)

D.6.7

Use the Box-Cox procedure to transform the model. Is the Box-Cox model an improvement over the previous model with regard to goodness-of-fit?

(2+2 = 4 points)

D.6.8

Make diagnostic plots to verify if the Box-Cox model developed in the previous question satisfies the 3 assumptions mentioned in D.6.4. Is the Box-Cox model an improvement over the previous model with regard to the assumptions? Explain.

(2+2 = 4 points)

D.6.9

Plot the residuals against each predictor for the Box-Cox model developed in D.6.7. For each predictor, comment if it seems to have a linear relationship with the response, and if the error variance seems constant.

(4x2 = 8 points)

D.6.10

Based on the above plots, transform two predictors of your choice to improve the model on the linear relationship assumption with regard to the predictors. Mention the intuition behind the transformations.

(2 + 2 = 4 points)

D.6.11

Make the diagnostic plots, and comment if the model developed in D.6.10 has further improved over the Box-Cox model developed in D.6.7 with regard to the 3 model assumptions and goodness-of-fit, and if it seems to satisfy the 3 assumptions.

(3x2 + 2 + 3 = 11 points)